Picture for Suchin Gururangan

Suchin Gururangan

Jack

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Add code
Sep 17, 2025
Viaarxiv icon

Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder

Add code
Feb 19, 2025
Viaarxiv icon

Self-Generated Critiques Boost Reward Modeling for Language Models

Add code
Nov 25, 2024
Figure 1 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 2 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 3 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 4 for Self-Generated Critiques Boost Reward Modeling for Language Models
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

Language models scale reliably with over-training and on downstream tasks

Add code
Mar 13, 2024
Figure 1 for Language models scale reliably with over-training and on downstream tasks
Figure 2 for Language models scale reliably with over-training and on downstream tasks
Figure 3 for Language models scale reliably with over-training and on downstream tasks
Figure 4 for Language models scale reliably with over-training and on downstream tasks
Viaarxiv icon

LESS: Selecting Influential Data for Targeted Instruction Tuning

Add code
Feb 20, 2024
Figure 1 for LESS: Selecting Influential Data for Targeted Instruction Tuning
Figure 2 for LESS: Selecting Influential Data for Targeted Instruction Tuning
Figure 3 for LESS: Selecting Influential Data for Targeted Instruction Tuning
Figure 4 for LESS: Selecting Influential Data for Targeted Instruction Tuning
Viaarxiv icon

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

Add code
Jan 19, 2024
Figure 1 for Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Figure 2 for Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Figure 3 for Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Figure 4 for Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Viaarxiv icon

AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

Add code
Jan 16, 2024
Viaarxiv icon